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A Homotopy  Based  Approach  to  Unconstrained  Optimization 
by  Mordecai  Avriel  and  Jerald  P.  Dauer 
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1.  Introduction 

We  consider  the  problem  of  finding  the  minimum  value  of  a real  valued 
function  f defined  on  Rn  from  a homotopic  point  of  view.  Davidenko  [5] 
approached  this  problem  by  embedding  the  gradient  of  f,  denoted  by  F,  into  a 
family  of  operators  and  solving  a resulting  differential  equation  to  obtain  a 
stationary  point  of  f (see  Broyden  [4]  for  a discussion  of  this  technique 
with  further  references).  Boggs  [2]  examined  the  use  of  A-stable  integration 
techniques  with  this  approach  and  developed  specific  algorithms  for  solving 
the  system  F(x)  = 0. 

In  Section  2 we  use  the  Davidenko  embedding  method  with  the  homotopy 


H(t,  x)  = e t(x  - x°)  + (1  - e C)F(x),  t ^ 0. 
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The  corresponding  differential  equation  is  then  approximated  in  Section  3, 
using  Euler’s  rule,  to  obtain  the  discrete  iteration  formula 

(1.1)  xk+1  = xk  - [(1  - X )I  + XkFx(xk)]_1F(xk),  e R. 

Convergence  properties  of  this  iteration  scheme  for  appropriately  chosen 
direction  parameters  Xk  are  then  developed. 

In  Section  4 the  convergence  properties  of  (1.1)  are  considered  in  the 
case  where  f is  a quadratic  function  and  the  Hessian  matrix  F^(x  ) is 
approximated  by  a matrix  that  satisfies  the  secant  (quasi-Newton)  relation. 

A quadratic  termination  property  is  developed  that  is  similar  to  that  of  some 
variable  metric  methods.  Section  5 examines  the  choice  of  direction  parameter 


and  develops  several  global  convergence  results. 


Formula  (1.1)  is  an  iteration  scheme  containing  the  direction  parameter 
which  relates  the  gradient  and  Newton  directions.  A similar  formula  of 
the  form 


xk+1  - xk  - [UkI  + Fx(xk)}_1F(xk),  Pk  > 0 

was  developed  by  Levenberg [11]  and  by  Marquardt  [13]  for  least  squares 
estimation  problems.  The  Levenberg-Marquardt  type  of  algorithm  for  minimi- 
zation problems  was  also  considered  by  Goldfeld,  Quandt  and  Trotter  [9]  and 
by  Luenberger  [12,  p.157],  HebdenflO]  further  developed  the  algorithm  due 
to  Goldfeld  et  al.  using  an  approach  which  is  particularly  suited  for  large 
systems  when  second  derivatives  are  available.  A related  "dogleg"  strategy 
was  developed  by  Powell  [19,  20,  21]  and  modified  by  Dennis  and  Mei  [6],  Another 
approach  based  on  the  gradient  path  corresponding  to  the  local  quadratic 
approximation  of  f was  developed  by  Vial  and  Zang  [25]. 

The  dampened  version  of  iteration  formula  (1.1), 

xk+1  , xk  -ak[(l  - Ak)l  + XkFx(xk)]_1F(xk), 

is  equivalent  for  0 < Ak  jc  1 to  the  dampened  version  of  the  Levenberg- 
Marquardt  formula, 

k+1  k . t p / kN1-lw/  k. 
x - x - rkfnkI  + Fx(x  )]  F(x  ) , 

where  Pk  ■ (1  - Xfc) / and  rfc  ■■  ak/Xk-  However,  the  analysis  of  this 
paper  will  not  impose  such  a restriction  on  In  fact,  nonpositive 


nun  jishuju  u. 


■#.•■■  j 


values  of  Xfc  and  particularly  X^  ■ 0,  which  gives  the  gradient 
direction,  might  be  desirable  for  (1.1)  at  a given  iteration  depending 
on  the  eigenvalues  of  F^(x  )•  This  differs  from  the  motivation  and 
implementation  of  the  Levenberg-Marquardt  formula,  see  [9,10,11,13] . 


2.  Derivation 

Let  f : Rn  -*■  R and  consider  the  unconstrained  optimization  problem 
minimize  f(x)  over  x E Rn. 

In  order  to  solve  this  problem  we  let  F(x)  be  the  gradient  of  f at  x 
and  consider  the  system 


(2.1) 


F(x)  * 0. 


An  approach  to  solving  (2.1)  is  to  embed  the  operator  F into  a family  of 
operators  H(s,  •)  with  parameter  s and  solve  a corresponding  differential 
equation.  To  this  end  we  assume  that  F has  a locally  Lipschitz  continuous 
(Frechet)  derivative  F^. 

Following  the  development  of  Meyer  [14  ] we  let 


(2.2) 


H(s,x)  = a(s)G(x)  + b(s)F(x),  0 < s < 1. 


Here  we  assume  a,  b : [0,  1]  ■*  R and  G : Rn  -*■  Rn  are  twice  differentiable 

and  chosen  so  that  H(l,  x)  « F(x)  and  H(0,  x^)  * 0 for  some  known 

0 c _n 
x € R . 
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If  x(s)  is  a solution  of  the  equation  H(s,  x)  = 0 such  that 
H^(s,  x(s))  is  nonsingular  for  all  s S [0,  1]  then,  by  the  implicit  function 
theorem,  x(s)  satisfies 


(2.3)  x » H (s,  x)  (s,  x)  x(0)  - x®. 

X s 


Conversely,  if  x(s)  is  a solution  of  (2.3)  on  [0,  1],  then  — (s,  x(s))  * 0 

as 


for  s € [0,  1].  Hence 


H(l,  x(l))  - H(0,  xu)  - 0 


and  so  x(l)  is  a root  of  F(x)  = 0.  Thus  system  (2.1)  can  be  solved  by 
embedding  F into  the  family  of  operators  (2.2)  and  integrating  the  corre- 
sponding differential  equation  (2.3). 

Conditions  for  existence  and  uniqueness  of  solutions  of  the  differential 
equation  (2.3)  on  [0,  1]  are  well-known,  in  particular  see  Meyer  fl4. 
Section  1]. 

The  particular  homotopy  we  consider  is 


(2.4) 


H(s,  x)  - (1  - s)(x  - x ) + sF(x),  0 < s < 1. 


The  change  of  variable  s * 1 - e gives  the  homotopy 


(2.5)  H(t , x)  » e fc(x  - x^)  + (1  - e C)F(x),  t > 0, 


and  the  homotoples  (2.4)  and  (2.5)  are  equivalent  for  s € [0,  1)  and  t _>  0. 
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Letting  t -*■  +00,  the  solution  x(t)  corresponding  to  (2.5)  approaches  a root 
of  F(x)  = 0.  Recalling  that 

(2.6)  H(t,  x(t))  = 0, 


dx 


and  letting  x'  = — we  calculate 
at 


(2.7) 


4^  = H + H x' 
dt  t x 


(2.8) 


[-e_t(x  - x°)  + e-tF(x) ] + [e_tI  + (1  - e^F  (x)Jx'  - 0. 


By  (2.6)  and  (2.8)  we  obtain  the  differential  equation 


(2.9)  x’  - -[e-tI  + (1  - e't)Fx(x)]‘'1F(x),  t > 0,  x(0)  = x°. 


1 


The  motivation  for  considering  the  homotopy  (2.5)  with  corresponding 
differential  equation  (2.9)  on  the  unbounded  interval  t 0 is  to  avoid 
undue  accuracy  in  calculating  the  approximation  to  the  solution  x(t)  of  the 
differential  equation  as  t approaches  its  limit.  This  accuracy  for  x(l) 
is  necessary  when  solving  equation  (2.3)  on  [0,  1],  (see  Boggs  [2]). 

3.  The  Iteration  formula  and  convergence  for  the  general  case 

The  differential  equation  (2.9)  motivates  (using  Euler's  rule)  the 
iteration  formula 

(3.1)  x »x-[e  I-*-(l-e  )FX^X  F^x  )• 
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r. 


By  writing  X * 1 - e equation  (3.1)  becomes 


(3.2) 


xk+1  = xk  - [(1  - \)I  + \Fx(xk))'1F(xk). 


Here  A^  Is  chosen  by  an  appropriate  selection  rule,  an  example  of  which  can 
be  described  as  follows:  Let  h : R R be  a twice  continuously  differen- 
tiable function  satisfying 


(3.3) 


h(l)  = 1,  h'(l)  - 0 


Define  then  the  sequence  {A  } by 


(3.4) 


Xk+1  * h(Xk)# 


For  example,  if  we  can  take 


(3.5) 


h(A)  - l - a(A  - 1)  , 0 < a < 1, 


with  | Aq  — l|  £ 1,  the  iterates  of  (3.4)  converge  (quadratically)  to  1. 

In  the  following  we  shall  deal  with  rates  of  convergence  of  sequences. 
In  this  connection  we  introduce  Q-  and  R-factors  for  (xk)  c Rn,  a 
sequence  converging  to  x.  For  every  p 1 we  define  the  Q-factors  of 
(xk)  as 
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(3.6)  Qp(x} 


0, 


if  x = x for  all  but  finitely  many  k. 


8xk+1'_  X#  k - 

lim  — , if  x ^ x for  all  but  finitely  many  k. 


k ■*  00  * xk  - x II  P 


+ ®. 


otherwise 


Similarly,  the  numbers 


(3.7) 


u k -h  1/k 
lim  11  x - x"  , 


if  P = 1, 


R {xk}  =<! 
P 


if  P > 1, 


ns  nxk  - x«1/(p>  , 

Jk  "*■  00 

are  called  the  R-factors  of  (xk). 

When  Q^{xk)  = 0,  the  sequence  {xk}  is  said  to  have  Q-superlinear 
rate  of  convergence.  If  0 < Q2{xk)  < +°°,  then  (xk)  has  Q-quadratic  rate 
of  convergence.  Similarly,  if  R^(x  } = 0,  the  rate  of  convergence  is  said  to 
be  R-superlinear  and  if  0<R2(x}<1,  the  rate  is  R-quadratic.  The 
relationship  between  Q-  and  R-factors  and  the  corresponding  convergence 
rates  have  been  examined  by  Ortega  and  Rheinboldt  [17 ] . In  particular , 
Q-quadratic  (superlinear)  convergence  implies  R-quadratic  (superlinear) 
convergence  (see  also  Tapia  [23])' 

The  local  convergence  of  the  iteration  (3.2)  follows  from  the  consistent 
approximation"  results  of  Ortega  and  Rheinboldt  [17,  p.  357],  as  stated  below. 


Theorem  3.1.  Let  x*  be  a solution  of  F(x)  - 0 for  which  Fx(**) 
is  nonsingular,  and  let  N be  a sufficiently  small  neighborhood  of  x*  . 


If  x £ N and  { are  chosen  sufficiently  close  to  1,  then  the 


1 


1 

"i 

a 


3 


7 


iterates  of  (3.2)  remain  in  N and  converge  to  x*  . Moreover,  if 

l±m  {X  } = 1 , then  the  iterates  of  (3.2)  converge  to  x*  in  a 
k “*•  » k 

Q-superlinear  manner. 

In  order  to  further  analyze  the  rate  of  convergence  of  the  iteration  (3.2) 
using  a parameter  selection  function  (3. A)  we  introduce  the  operator 
S : Rn+^  “*■  Rn  defined  by 

(3.8)  S(x,  X)  = x - [(1  - X)I  + XFx(x)]_1F(x) 
and  the  operator  T : Rn+1  -*•  Rn+1  defined  by 

(3.9)  T(x,  X)  - (S(x,  X),  h(X)). 


Then  the  formulas  (3.2)  and  (3.4)  are  equivalent  to 


(3.10) 


(xk+\  Xfc+1)  = T(x\  Xk), 


where  (x  , X^)  are  given.  Next  we  have 


Proposition  3.2.  The  operator  S has  a fixed  point  at  x*  for  some 
fixed  X if  and  only  if  F(x*)  = 0. 


Proof.  If  F(x*)  = 0,  then  from  (3.8)  we  have  S(x*,  X)  * x*  for  all 
X.  If  S(x*,  X)  ■ x*  in  (3.8)  the  nonsingularity  of  [ (1  - X)I  + XFx(x*) ] ^ 
gives  F(x*)  - 0.  O 


1 

| 

j 

j 


1 
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F 


From  here  on  we  assume  that  x*  is  a solution  of  system  (2.1)  and  take 
Xq  and  h in  (3.4)  to  be  chosen  so  that  the  sequence  1X^1  converges  to  1 
and  such  that  the  matrix  [(1  - X }I  + X F (x) ] is  invertible  for  x in  an 

tC  K X 

appropriate  neighborhood  of  x*.  Let  and  denote  the  derivative  of 

the  operator  S with  respect  to  x and  X,  respectively. 

Proposition  3.3.  If  the  matrix  [(1  - X)I  + AF^(x*)]  is  invertible 
for  some  value  X,  then  S^(x*,  X)  = 0. 

Proof.  From  (3.8)  we  have 

[(1  - X)I  + XF  (x)](x  - S (x,  X))  = F(x). 

X 

Thus  by  differentiating  with  respect  to  X we  obtain 

[-  I + F (x)](x  - S (x,  X))  - 1(1  - X)I  + XF  (x)]S  (x,  X)  = 0, 

X X t 

and 

S„(x,  X)  = [(1  - A) I + XF  (x)]-1[-  I + F (x) ) (x  - S(x,  X)). 

/ XX 

By  Proposition  3.2,  F(x*)  = 0 implies  (x*  - S(x*,  X))  = 0 which  completes 

the  proof.  □ 

Proposition  3.4.  Let  {A^}  converge  to  1.  Then 

lim  (S  (x* , A.  )}  = 0. 
k - « 1 k 
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Proof- 


s <**,  A)(n)  - 11.  s<*«-+  An,  A)  - s(x*,  A) 
1 t ■*  o T 


= lim 
T •*  0 


lim 

T •>  0 


[U-A)i+af  (x*+xn)]-1F(x*+Tn)  - [(i-A)i+af  (x*) ]_xf(x*) ' 

X X 


,-i. 


n - 


n - [(i-A)i+AFx(x*+Tn)]_1  >. 


since  F(x*)  = 0.  Hence 


si(x**  V(n)  = n ~ 1(1 " V1  + Vx(x*)]  lFx(x*>n- 


Taking  (A^)  converging  to  1 completes  the  proof. 


□ 


We  are  now  able  to  obtain  our  result  on  the  convergence  rate  of  the 
iteration  defined  by  (3.2)  and  (3.4). 


Theorem  3.5.  Let  h satisfy  condition  (3.3)  and  choose  A^  so  that 
the  sequence  defined  by  (3.4)  converges  to  1,  Then  the  sequence  { (x  , A^) } 
defined  by  (3.2)  and  (3.4)  converges  locally  to  (x*,  1)  and  this  convergence 
is  Q-quadratic. 


Proof . Let  DT  and  D T denote  the  first  and  second  derivatives  of  the 
operator  T,  defined  by  (3.9).  By  (3.3)  we  have 


h’(l)  = 0 
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in 


■r  , 

I 


1 1 


and  hence  Proposition  3.3  and  3.4  together  with  the  continuity  of  imply 


DT(x*,  1)  = 0.  Therefore,  Taylor's  Theorem  and  the  Schwarz  inequality  gives 


(xk+1,A  )-(x*,l)||  = l|T(xK,A  )-t(x*,1)|] 

k+1  k 


i \ !|D2T(x*+B(x*-xk),l+^(l-Ak))|||!(xk,Ak)-(x*,l)| 


for  some  0 and  ^ between  0 and  1.  Hence  we  have 


l|(xk+1,  A ) - (x*.  1)  i[ 

iim — r < + co 


(3.11) 


k - ~ l|(xk,  Ak)  - (x*,  1)  ll2 


and  from  (3.11)  and  (3.6)  it  follows  that  the  rate  of  convergence  is 
Q-quadratic.  □ 

As  was  pointed  out  by  Tapia  [23],  the  above  result  does  not  imply  that 
the  sequence  {x  } itself  converges  Q-quadratically . However,  following 
Tapia,  we  note  that 


||x  - x*!l  < !kx  , *k)  - (x*,  i)i 


and  so  definition  (3.7)  and  Theorem  3.5  give  the  following  result. 


Corollary  3.6.  Suppose  h and  A are  as  in  Theorem  3.5  and  let  the 


sequence  { (x  , A ) } be  defined  by  (3.2)  and  (3.4).  Then  the  sequence  (x  } 
converges  locally  to  x*  and  the  rate  of  convergence  is  R-quadratic. 

We  may  remark  here  that,  by  Theorems  3.1  and  3.5,  the  iteration  formula 
(3.2)  derived  from  the  homotopy  (2.5)  has  a rate  of  convergence  comparable 


1 
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to  Newton's  method,  which  is  Q-quadratic,  in  a neighborhood  of  a root  of  (2.1). 


4.  Convergence  for  quadratic  functions 

As  is  well  known,  the  evaluation  of  the  Hessian  matrix  F^(x  ) in 

formula  (3.2)  can  be  an  expensive  calculation.  We  therefore  consider  the 

convergence  question  where  the  Hessian  matrix  is  approximated  in  a quasi- 

Newton  manner,  see  for  example  [3,  16]*  In  this  approach  we  approximate 
|( 

F (x  ')  by  a square  matrix  A^  and  define  our  iteration  formula  by 


(4.1) 


xk+1  = xk  - [(1  - Afc)I  + X^A^]  1F(xk). 


We  assume  that  there  is  an  interval  J,  possibly  unbounded,  such  that  the 

matrix  [(1  - A) I + Aa^]  1 is  positive  definite  for  all  A £ J and  that 
k+1 

the  function  f (x  ) as  a function  of  A achieves  its  minimum  over  all 
A in  J at  some  point  A interior  to  J.  (If  A is  symmetric  the  interval 

K.  K 

J can  be  completely  described  in  terms  of  the  eigenvalues  of  A^)  Then  at 

A = A,  we  have 
k 


d k+1.  _ 

lx  f(x  } = °> 


(4.2)  (1  - Ak)F(xk+1)T(I  - A^Kl  ~ Afe)l  + AkAk]~2F(xk)  = 0. 


> k+1  k+1.  k.  k+1  k+1  k l7 

Let  Y = F (x  ) - F(x  ) and  p = x - x . It  is  known  17,16] 


that  the  quasi-Newton  iteration  formula 


k+1  k , k. 

x = x - BkF(x  ) 


will  converge  to  a stationary  point  of  the  quadratic  function  f : R -*•  R 


in  n + 1 iterations  if  the  matrix  is  updated  at  each  iteration  so  that 

it  satisfies  the  secant  relation 


(4.3) 


BjJ1  = p1,  i = 1,  2,  . . . , k. 


If  f is  quadratic  with  Hessian  matrix  Q,  then  equation  (4.3)  is  automatically 
satisfied  for  = Q \ The  secant  relation  also  holds  if  is  updated  by 

the  symmetric  rank  one  formula,  see  [16].  We  now  derive  a similar 
quadratic  termination  result  for  the  iteration  (4.1).  This  result  was 
motivated  by  the  work  of  Vial  and  Zang  [25] . 


k+1 

Theorem  4.1.  Suppose  f is  quadratic  and  x is  given  by  formula 

(4.1)  where  X e J satisfies  (4.2).  If  B^  = A^  exists  and  satisfies  the 

k+1 

secant  relation  (4.3),  then  either  p is  linearly  independent  of 

1 k , k+1.  _ 

p , p or  else  F(x  ) = 0. 


k+1  1 k 

Proof.  Suppose  p is  linearly  dependent  on  p , . . . , p . Then  there 

exist  numbers  3 , ...,  3 , not  all  zero,  such  that 

1 K 


(4.4) 


k+1 


k , 

= l s1p  . 


i=l 


TIT  i i 

If  f is  the  quadratic  function  f (x)  = a + b x + -^  x Qx,  we  have  Y = Qp 
and  hence 


(4.5) 


Yk+1  = QPk+1=  l 3^=  l 3/. 

i=l  i=l 


Since  satisfies  the  secant  relation  (4.3),  equations  (4.4)  and  (4.5)  imply 
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Pk+1=  ! e^V-^1  l B y1  = A~1yk+1. 

i=l  i=l  1 k 


Hence  we  obtain 


(4.6) 


k+1  A-l-rv,  k+1t  0/  kM 
p = A,  [ F(x  ) - F (x  )]. 


However,  from  equation  (4.1)  we  have 


(4.7) 


Pk+L  = - [(1  - Afc)l  + \AkJ'1F(xk), 


Equations  (4.6)  and  (4.7)  then  give 


F(xk+1)  = F(xk)  - ^[(1  - Xfe)I  + \Ak]_1F(xk) 


= (I  - Ak[(l  - Afc)I  + AkAkJ  1)F(xk) 


= (1  - Afc)(I  - Ak)[(l  - Ak)I  + A^!  Vx11) 


- (i  - vt(1 " V1  + W(I  ■ Vt(1  ■ V1  + 


since  both  I and  A^  commute  with  [(1  - A^)I  + ^A^J.  Hence  it  follows 


(4.8) 


k+l.Tr,  . T * a . i-l_,  k+1. 

F (x  ) [(1  - Afc)I  + ARAfc]  F (x  ) 


k + 1 T -2  lr 

(1  - Ak)F(xK  V(I  - Ak){(l  - Ak)I  + A^J  F (xK)  - 0, 
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since  X,  satisfies  (4.2).  But  X,  € J implies  that  the  matrix 
k *■ 

[(1  - X^)I  + ^ is  Pos:*-c^ve  definite  and  therefore  (4.8)  yields 

k+1 

F(x  ) = 0,  as  asserted. 

Since  there  can  be  at  most  n linearly  independent  vectors  in  R 
the  following  result  is  valid. 


Corollary  4.2.  Suppose  that  for  each  k = 1,  2 


n,  A ?■ 


satisfies  the  secant  relation  (4.3)  and  it  is  possible  to  choose  X 


satisfying  equation  (4.2)  and  such  that  the  matrix  [ (1  - A^)I  + X^A^i 
is  positive  definite.  Then  the  iteration  formula  (4.1)  will  obtain  a 


stationarv  point  of  a quadratic  function  in  at  most  n + 1 iterations  and 


Remark.  The  results  of  sections  3 and  4 are  also  valid  for  the 
corresponding  Levenberg-Marquardt  algorithms.  In  this  case  equation 
(4.2)  becomes 


k+l.T.  ,-2_ , k. 

F (x  ) [pRI  + Ak]  F (x  ) = 0. 


The  proofs  of  these  results  follow  directly  along  the  lines  of  those 
above. 


5.  Discussion  of  the  direction  parameter  and  some  global  convergence  results 
There  are  a variety  of  possible  ways  for  choosing  the  direction 
parameter  X^  in  iteration  formulas  (3.2)  or  (4.1).  One  approach  is 
to  choose  A.  by  a predetermined  rule  which  is  based  on  previous  infor 


mat ion  about  the  system.  This  type  of  selection  is  fundamental  in  many 
of  the  Levenberg-Marquardt  or  "dogleg"  strategies  that  have  been  developed, 
see  [6,10,11,19]. 

One  could  also  use  formula  (4.2)  restricting  X^  to  the  interval  J 
where  the  matrix  [(1  - X)I  + XA^]  ^ is  positive  definite.  The  following 
results  characterize  this  interval  for  symmetric  matrices  A^. 


Proposition  5.1.  Suppose  A is  symmetric.  Then  the  following  are 


equivalent: 

The  matrix  [(1  - X)I  + XA]  is  invertible  for  all  0 < X < 1 


(ii)  The  matrix  [(1  - X)I  + XA]  is  positive  definite  for  all 
0 < X < 1, 

(iii)  A is  positive  definite. 


T T 

Proof.  Write  A * UEU  where  UU  = I and  E is  a diagonal  matrix 
with  diagonal  elements  0^,0^, . . . ,0^,  the  eigenvalues  of  A,  see  [13,  p.  36], 
Then  the  matrix 


(5.1) 


[(1  - X)I  + XA]  = U[(l  - X) I + XE]U 


has  eigenvalues  1 - X + Xa^  for  i = l,2,...,n.  Considering  the  determi- 
nant in  (5.1)  we  see  that  [(1  - X)I  + XA]  is  invertible  iff  1 - X + Xo^  j4 
0 for  all  i - 1,2,. . . ,n,  i.e.  iff  X f 1/(1  - o^)  for  all  i * 1,2,. . . ,n. 
But  1/(1  - 0^)  £ [0,1]  iff  > 0.  Consequently,  A is  positive  definite 

iff  > 0 for  * ” l,2,...,n.  Hence  (i)  and  (iii)  are  equivalent.  To  see 
that  (ii)  and  (iii)  are  equivalent  note  that  1 - X + Xo^  > 0,  i.e.  1 > 

X(1  - a )t  for  all  X € [0,1]  iff  > 0.  □ 
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Clearly  the  Interval  J contains  A = 1 iff  A is  positive  def 
inite.  Note  also  that  A * 0 is  in  J for  every  A.  Therefore  the 
gradient  direction  is  a feasible  direction  at  each  iteration;  this  is 
not  the  case  in  the  Levenberg-Marquardt  algorithms. 


Proposition  5.2.  Suppose  A is  symmetric  and  its  eigenvalues  are 

a.  < a_  < ...  <0  . Let  J be  the  set  of  real  numbers  X such  that 
1 — 2 — — n 

the  matrix  [(1  - A)1  + AA]  * is  positive  definite. 

(i)  If  0^  = 0^  = 1,  then  J = . 

(li)  If  0^  < 0n  £ 1,  then  J » (-°ot-~---) . 

(iii)  If  a2  < 1 < a , then  J * q-}-  ,--7-—). 

n 1 


(iv)  If  1 < 0,  < o , then  J * (t r_»+co)  ■ 

— n jL  — 0 


n 


T 

Proof.  Take  A * UEU  as  the  proof  of  Proposition  5.1.  Note  that 
if  = 1 then  1 - A + A 0^  = 1 for  all  A.  Hence  part  (i)  is  valid. 

To  prove  part  (ii)  let  ai  — °±  < *•»  then  1 - 0^  _>  1 - > 0.  So, 

1/(1  - > 1/(1  - ai)  > 0.  Therefore,  if  A < 1/(1  - c^)  < 1/(1  - a±) , 

we  have  A(1  - o^)  <1  or  0 < 1 - A + Aa^.  For  the  converse  note  that 
if  A _>  1/ (1  — 0^)  we  have  0 ^ 1 - A + Ao^  which  implies  that  [(1  - A)  I + 
AA]  has  a negative  eigenvalue. 

To  prove  part  (iv)  we  consider  1 < 0^  £ 0^.  Then  O^l-0^^1-0n 

and  so  1/(1  - 0.)  < 1/(1  - a ) < 0.  Therefore,  if  A > 1/(1  - 0 ) > 

1 ~ n n — 

1/(1  - o^)  we  have  A(1  - a^)  <1  or  0 < 1 - A + A 0^.  For  the  converse 

note  that  if  A < 1/(1  - o ) we  have  A(1  - 0 ) > 1 or  0 > 1 - A + Ao 
— n n — — n 

which  implies  {(1  - A)I  + AA]  has  a negative  eignevalue. 

The  proof  of  part  (iii)  follows  immediately  from  the  above  two  cases.  □ 
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From  the  proof  of  Proposition  5.2  one  can  see  that  the  magnitude  of 


the  vector 


xk+1(A)  = xk  - 1(1  - X)I  + XAj’WS 


can  be  expected  to  increase,  usually  without  bound,  as  X approaches 

either  end  point  of  J.  In  this  case  the  continuity  of  F with  respect 

to  X would  indicate  the  existence  of  a minimum  of  this  function  in 

J - that  is,  a value  of  X which  satisfies  (4.2). 

Another  approach  which  would  avoid  solving  equation  (4.2)  is  to  use 

a search  technique  to  find  a number  that  approximates  the  value  of  X^ 
k+1 

for  which  f[x  (X)]  achieves  its  minimum  over  J.  This  approach  for 
determining  X^  appears  to  have  promise  computationally  since  it  produces 
an  approximation  to  the  best  direction  available  when  using  a positive 
definite  matrix  in  the  iteration  formula. 

This  is  further  supported  by  the  following  global  convergence  pro- 
perty for  the  dampened  iteration  formula 


(5.3) 


xk+1  - xk  - ak[(l  - X )I  + XkFx(xk)]-1F(xk)  . 


The  dampening  term  ak  is  introduced  since  the  undampened  version,  as 
in  Newton's  method,  need  not  involve  a descent  direction  for  a general 
f except  in  an  appropriate  neighborhood  of  x*  . For  the  next  results 
we  let  J(x)  be  the  set  of  real  numbers  X for  which  the  matrix 
[(1  - X)1  + A?x(x)]  * is  positive  definite.  Since  F^(x)  is  symmetric 
and  the  eigenvalues  of  a matrix  vary  continuously  with  the  components, 
Proposition  5.2  shows  that  J(x)  is  an  interval  which  depends  continuously 
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on  x . Hence  there  exists  a continuous  selection  function  X:Rn  R 
such  that  A(x)  6 J(x)  for  all  x [15,  Th.  2.1].  Indeed,  there  are 
many  such  functions  and  we  will  demonstrate  one  shortly.  Using  such  a 
function  A(x)  , iteration  formula  (5.3)  can  be  employed  as  follows: 


k k. 

i)  Given  a point  x set  A^  = A(x  ) ; 

ii)  Define: 

xk+1(Ak,a)  = xk  - a[(l  - Ak)I  + AkFx(xk) )"1F(xk) 

k+1 

and  let  a minimize  f[x  (X  ,a) ] over  a 0 ; 

K K 

....  c _ k+1  k+1 , , . 

iii)  Set  x = x • 


Theorem  5.3.  Suppose  A(x)  is  a continuous  selection  function 
for  J and  let  the  sequence  {xk}  be  defined  by  iteration  formula  (5.3) 
where  we  set  Afc  = A(xk)  and  let  ak  minimize  f [xk+^(Ak,ct)  ] over  a _>  0 
Then  the  limit  of  any  convergent  subsequence  of  (xk)  is  a solution  of 
(2.1).  In  particular,  if  (xk)  is  bounded,  it  will  have  limit  points 
and  each  of  these  will  be  a solution  of  (2,1). 

Proof . Let  D:Rn  R^n  be  a map  which  associates  to  every  point 

k k k k 

x the  pair  (x  ,d  ) where  d is  the  direction 


dk  - -1(1  - Ak)I  + AkFx(xk)]_1F(xk)  . 


k k 

Note  that  since  Ak  £ J(x  ) , d is  a direction  of  descent  for  f . 
Further,  since  A(x)  is  continuous,  the  map  D is  a continuous  point- 
to-set  map.  Let  S:R^n  -*■  Rn  be  a point-to-set  map  defined  by 


S(x,d)  ■ (y:y  ■ x + ad  for  some  a _>  0 , f(y)  - min  f(x+ad)}. 

0<a«» 
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. 1 W ’ T"*.  ' 


k k 

Then  X^  € J(x  ) implies  d ^ 0 and  so  S is  a closed  map  on  the 
range  of  D [ 12 , p.146].  Hence  the  composite  map  SoD  is  closed  and  the 
limit  of  any  convergent  subsequence  of  the  iteration 

k+1  c _/  k. 
x = SoD(x  ) 

is  a solution  of  (2.1)  [12,  p,125j.  Weierstrass'  Theorem  that  Rn  is 
sequentially  compact  shows  that  if  (x  } is  bounded  it  has  a convergent 
subsequence.  This  completes  the  proof.  □ 

The  success  which  quadratic  approximation  of  functions  has  enjoyed 

in  optimization  techniques  motivates  the  following  example  of  a continuous 

k+1 

selection  function  X(x)  . Approximate  the  function  f[x  (X)]  by  a 

quadratic  function  whose  coefficients  are  determined  by  three  values  of 

X which  depend  continuously  on  the  endpoints  of  the  interval  J(x  ) , 

such  as  the  values  X^  = 0,  X^  = (0.1)y,  where  y = max{-£,  } , 

n 

and  X^  = (0.1)6,  where  6 = min{£,  y ^-}  for  specified  e,  £ . Then 
1.  1 

let  X(x  ) be  the  minimum  point  for  this  quadratic  function  or  an  ap- 

propriate  bound  in  case  the  minimum  point  does  not  lie  in  J(x  ) or  lies 

k 

too  close  to  one  end  point  of  J(x  ) . Other  approaches  for  defining 
the  selection  function  X(x)  are  motivated  by  the  techniques  described 
in  [6,10,19,21]. 

We  now  consider  iteration  formula  (5.3)  where  the  dampening  term 
ak  is  not  an  exact  minimization  step  size.  Instead  we  use  an  Armijo 
step  size  procedure  which  is  based  on  sufficient  function  value  decrease 
[1].  For  this  result  we  let  II denote  the  Euclidean  norm  of  Rn  . 
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Theorem  5.4.  Suppose  X<x)  is  a continuous  selection 
function  for  J and  define 

S(x,6)  = { xa : xa  = x - a[(l-X(x))I  + X(x)Fx(x)  ]_1F(x)  , 
a > 0,  f(xa)  - f (x)  < -6\  |F(x)j  j2}. 
Assume  the  following  conditions  hold: 

i ) The  initial  point  x^  is  given  and  is  such  that 
the  set 

S(x^)  = {x:  f ( x)  < f(x^)} 
is  bounded. 

ii)  There  exists  a constant  K such  that 
| | F(y)-F(x)  | | < K | | y-x|  j 
for  all  x ,y  g S(x°). 

Then  there  exists  a 6 > 0 such  that  for  any  x 6 S(x°)  the 
set  S(x,6)  is  a nonempty  subset  of  S ( x° ) . Further,  for 
any  such  6 and  any  sequence  (xk)  ; uch  that 
xk  + 1 ^ S(xk,<5)  , k = 0,1,2,..., 

the  sequence  has  at  least  one  convergent  subsequence  and 
the  limit  of  any  convergent  subsequence  is  a so lution  of 
(?.l)  . 
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Proof.  We  let  P(x)  denote  the  matrix 
P(x)  = [<l-A(x))I  + A(x)F  (x)]*, 

and  |]P(x)j|  is  the  operator  norm  induced  by  the  Euclidean 
norm  on  Rn.  Thus  |[P(x)j|  is  a continuous  real  valued 
function  on  Rn  and  hence  is  bounded  on  the  closure  of  S(x^), 
say 

| |P(x)  | | < M 

for  all  x E S(x^).  Similarly,  there  is  an  N > 0 such  that 
| | PCx)"1 ||  < N 

for  all  x ^S(x^).  Take  6 = (*tKM2N2)  ^ and  let  x £ S ( x^1 ) . 
For  any  fixed  a the  mean  value  theorem  implies  there  is 
a point  x satisfying 


and  such  that 

f(xa)  - f(x)  = (xa-x)rF(x) 

= -a[P(x)F(x)  3T[F(x)  + F(x)  - F(x)] 

= -aF( x)TP( x) F( x)  - aF( x)TP( x) [F(x>  - F(x)] 

< -aF( x)TP( x) F( x)  + a|  |F(x) j |MK|  |xa  - x|| 

< -aF( x) TP ( x ) F( x ) + a2KM2 | | F(x) | | 2 , 

using  (ii).  Since  P(x)  is  a symmetric  positive  definite 

T 

matrix  for  each  x we  can  wri te  P(x)  = G G for  some  matrix  G. 
Letting  p(A)  denote  the  spectral  radius  of  the  matrix  A we 
have  the  following  inequality  holding  for  each  y E Rn  [24,p.ll] 


...  • r ....  r , 

| 

J yTP(  x)y  | = j J Gy  | | 2 > J 2 

i ig  xi  r 

3 s Jiy_llL_ 

p(G-1G  11 ) p(P  l(x)) 

s UxLL2__  > LLxiii. 

I |P_1(x)| I ' N 

Hence  we  have 

f(xa)  - f ( x ) < -~|  | FC  x) 1 | 2 + a2KM2 | | F(x) | i 2 
= -6  J 1 FCx) | | 2 

for  a = (2KM2N)  Thus  S(x,6)  is  not  empty. 

New,  lot  6 > 0 bo  such  that  S(x,6)  is  nonempty  for  any 

_ 0 k 

x E ^(x  ) and  suppose  the  sequence  {x  } is  such  that 

xk  *■ 1 G S ( xk  , 6 ) , k = 0,1,2,.... 

Then  the  definition  of  S(x,6)  implies  the  sequence  {f(x  )}, 
which  is  bounded  below  by  its  minimum  vlaue , is  monotone 
nonincreasing  and  hence  converges.  Since 

(5.4)  f(xk+1)  - f(xk)  < - 6 | | F ( xk ) | | 2 < 0, 

Die  sequence  {F(x  )}  converges  to  the  zero  vector.  The 
result  now  follows , since  S(x^)  is  bounded.  [ [ 

This  result  indicates  that  a good  procedure  for  deter- 
mining the  dampening  term  is  to  let  be  the  first  number 
in  the  sequence  (0.1)-*,  j = 0,1,?,...,  that  satisfies  the 
condi tion 
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f(xk  + 1)  - f(xk)  < ea  I I F ( Xk  > { |2, 

for  an  appropriate  small  number  e > 0,  say  e = 0.0001  as 
is  suggested  by  Powell  [22,  (3.5)].  This  type  of  procedure 
was  proposed  by  Fletcher  [8]  and  is  a successful  method 
in  practice  (see  also  [17,  p.503]).  The  value  = 1 is  used 
first  since  we  eventually  expect  R-quadratic  convergence  by 
Corollary  3.6,  provided  the  selection  function  A(x)  approaches  the 
value  1 near  the  solution.  In  fact,  this  would  be  expected  when 
using  either  equation  (4.2)  or  the  quadratic  approxima- 
tion approach  described  above  since  they  are  designed  to 
obtain  the  best  direction  when  using  unit  step  length. 

Thus  these  selection  functions  would  favor  Newton's  direction 
near  a solution. 

It  should  be  noted,  however,  that  R-linear  convergence 
can  be  guaranteed  for  any  continuous  selection  function 
provided  equation  (2.1)  has  a finite  number  of  solutions 
and  the  Hessian  matrix  F^tx*)  is  nonsingular  at  these  points. 

In  fact,  under  these  assumptions  the  results  of  Ostrowski 
([18],  see  also  [17,  Theorems  14.1.5  and  14.1.6])  imply  that 
the  sequence  (x  } itself  will  converge  to  a solution  of 
(2.1)  with  at  least  an  R-linear  rate  of  convergence.  This 
result  follows  immediately  from  those  of  Ostrowski  using 
equation  (5.4). 

Theorems  3.5,  5.3  and  5.4  motivate  a number  of  other 
iteration  schemes  which  should  have  good  convergence  pro- 
parties  even  when  A , a.  and  F (x  ) are  approx  i mated  appro- 

K K X 
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priately.  Naturally  the  success  of  such  schemes  depend  on 
how  the  various  detail s are  implemented.  However,  these 
results  will  rely  on  computational  experience  and  so  will 
be  reported  separately. 
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